fix: cap embed payload at 30k chars to prevent Ollama context overflow (#53)#54
fix: cap embed payload at 30k chars to prevent Ollama context overflow (#53)#54
Conversation
- Adăugat sistem de 'embedded skills' folosind go:embed pentru fișiere Markdown (fără hardcoding). - Implementat management de skills (list_skills, install_skill) cu detecție dinamică a metadatelor. - Actualizat sistemul de update cu caching de 24h și verificări non-blocking la tool calls. - Adăugat bibliotecă de 6 skill-uri de bază (Go, Python, Laravel, Debugging, etc.). - Adăugat teste unitare pentru pachetul internal/skills. - Actualizat .gitignore și regulile IDE generate automat pentru a suporta noul sistem.
- Fix skills path traversal validation - Fix race condition in update check - Add Windows zip support for updates - Improve test cleanup - Add missing tool schemas
Add buildEmbedText() helper that caps embed payload at 30 000 chars (~7 700 tokens, safe for 8 192-token models like nomic-embed-text). Metadata (docstring + signature) is always preserved in full; only the Code body is truncated when the total exceeds the limit. A WARN is logged whenever truncation occurs so users can identify large symbols. Fixes #53
There was a problem hiding this comment.
Pull request overview
This PR addresses embedding failures caused by oversized embed payloads during indexing (Ollama context overflow), and also introduces a bundled “skills” system plus update-check/apply tooling with caching.
Changes:
- Cap embedding input via
buildEmbedText()(30k Unicode chars) and emit warnings when truncation occurs. - Add an updater cache (24h TTL) and expose update/skill management as MCP tools (
check_update,apply_update,list_skills,install_skill). - Improve file-write correctness in a few places by explicitly handling
Close()errors and adjust.gitignorefor new skill install output.
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/workspace/state.go | Switch Save() to a write lock and improve close/error handling on state persistence. |
| internal/updater/updater.go | Add update caching, context-aware HTTP, zip extraction, safer atomic cache writes, and Windows fallback behavior. |
| internal/updater/updater_test.go | Tests for updater cache path creation and save/load behavior. |
| internal/tools/updates.go | New MCP tools to check for updates and apply updates. |
| internal/tools/list_skills.go | New MCP tool to list bundled skills. |
| internal/tools/install_skill.go | New MCP tool to install/uninstall a skill into a workspace. |
| internal/skills/skills.go | Embed and manage bundled skills; install/uninstall logic and metadata parsing. |
| internal/skills/skills_test.go | Tests for skill listing, install/uninstall, and path traversal protection. |
| internal/skills/embedded/ragcode-update/SKILL.md | New embedded skill documentation for updates/maintenance workflow. |
| internal/skills/embedded/ragcode-priority/examples/search_patterns.md | New examples for recommended search workflows. |
| internal/skills/embedded/ragcode-priority/SKILL.md | New embedded skill that enforces “ragcode tools first” workflow. |
| internal/skills/embedded/python-best-practices/SKILL.md | New embedded skill for Python guidelines. |
| internal/skills/embedded/php-laravel/SKILL.md | New embedded skill for PHP/Laravel guidelines. |
| internal/skills/embedded/go-best-practices/SKILL.md | New embedded skill for Go best practices. |
| internal/skills/embedded/debugging-guide/SKILL.md | New embedded skill for debugging workflow. |
| internal/ragcode/indexer.go | Add capped embed text builder and warning logging on truncation. |
| internal/ragcode/indexer_embed_test.go | Tests for embed text truncation behavior and boundaries. |
| cmd/rag-code-mcp/main.go | Wire new tools, update CLI update path, and add background update checks + schemas. |
| cmd/install/main.go | Adjust install path construction and improve close/error handling during file operations. |
| .gitignore | Stop ignoring all of .agent/; ignore only .agent/skills/ and add debug log ignores. |
| var full string | ||
| if ch.Code != "" { | ||
| if meta != "" { | ||
| full = meta + "\n\n" + ch.Code | ||
| } else { | ||
| full = ch.Code | ||
| } | ||
| } else { | ||
| full = meta | ||
| } | ||
|
|
||
| runes := []rune(full) | ||
| if len(runes) <= maxChars { | ||
| return full, false | ||
| } | ||
|
|
||
| // Truncate only the Code portion — keep metadata intact. | ||
| metaWithSep := meta | ||
| if meta != "" && ch.Code != "" { | ||
| metaWithSep = meta + "\n\n" | ||
| } | ||
| metaRunes := []rune(metaWithSep) | ||
| remaining := maxChars - len(metaRunes) | ||
| if remaining < 0 { | ||
| remaining = 0 | ||
| } | ||
| codeRunes := []rune(ch.Code) | ||
| if remaining > len(codeRunes) { | ||
| remaining = len(codeRunes) | ||
| } | ||
| truncated := metaWithSep + string(codeRunes[:remaining]) | ||
| return truncated, true |
There was a problem hiding this comment.
buildEmbedText() eagerly constructs full = meta + "\n\n" + ch.Code and then later converts the whole string to []rune. For very large ch.Code values this forces a full copy/allocation (and a full rune slice) even though the result will be truncated, which can significantly increase memory/CPU during indexing. Prefer computing rune lengths and truncating the code portion without concatenating/allocating the full payload up front (e.g., build only the needed prefix up to maxChars).
| var full string | |
| if ch.Code != "" { | |
| if meta != "" { | |
| full = meta + "\n\n" + ch.Code | |
| } else { | |
| full = ch.Code | |
| } | |
| } else { | |
| full = meta | |
| } | |
| runes := []rune(full) | |
| if len(runes) <= maxChars { | |
| return full, false | |
| } | |
| // Truncate only the Code portion — keep metadata intact. | |
| metaWithSep := meta | |
| if meta != "" && ch.Code != "" { | |
| metaWithSep = meta + "\n\n" | |
| } | |
| metaRunes := []rune(metaWithSep) | |
| remaining := maxChars - len(metaRunes) | |
| if remaining < 0 { | |
| remaining = 0 | |
| } | |
| codeRunes := []rune(ch.Code) | |
| if remaining > len(codeRunes) { | |
| remaining = len(codeRunes) | |
| } | |
| truncated := metaWithSep + string(codeRunes[:remaining]) | |
| return truncated, true | |
| metaWithSep := meta | |
| if meta != "" && ch.Code != "" { | |
| metaWithSep = meta + "\n\n" | |
| } | |
| metaRuneCount := len([]rune(metaWithSep)) | |
| if ch.Code == "" { | |
| if metaRuneCount <= maxChars { | |
| return meta, false | |
| } | |
| return meta, true | |
| } | |
| remaining := maxChars - metaRuneCount | |
| if remaining < 0 { | |
| return metaWithSep, true | |
| } | |
| codeRuneCount := 0 | |
| for idx := range ch.Code { | |
| if codeRuneCount == remaining { | |
| return metaWithSep + ch.Code[:idx], true | |
| } | |
| codeRuneCount++ | |
| } | |
| return metaWithSep + ch.Code, false |
| } | ||
| if wasTruncated { | ||
| // Log a warning so users know which symbols were partially indexed. | ||
| // The full code is still accessible via rag_read_file_context. |
There was a problem hiding this comment.
The comment references rag_read_file_context, but that tool name doesn't appear to exist in this codebase (the MCP tool is get_code_context). This can confuse users reading logs; consider updating the comment to the correct tool name or removing it.
| // The full code is still accessible via rag_read_file_context. | |
| // The full code is still accessible via get_code_context. |
| "properties": map[string]interface{}{ | ||
| "force": map[string]interface{}{ | ||
| "type": "boolean", | ||
| "description": "Force update even if version matches (default: false)", |
There was a problem hiding this comment.
Tool schema for apply_update documents force as defaulting to false, but ApplyUpdateTool.Execute defaults force to true when the argument is omitted. This mismatch can lead to surprising behavior for MCP clients relying on the schema. Align the schema description with the implementation, or change the tool default to match the documented default.
| "description": "Force update even if version matches (default: false)", | |
| "description": "Force update even if version matches (default: true)", |
| @@ -655,6 +664,10 @@ func main() { | |||
| registerAgentTool(server, searchDocsTool, cfg) | |||
| registerAgentTool(server, hybridTool, cfg) | |||
| registerAgentTool(server, indexWorkspaceTool, cfg) | |||
| registerAgentTool(server, listSkillsTool, cfg) | |||
| registerAgentTool(server, installSkillTool, cfg) | |||
| registerAgentTool(server, checkUpdateTool, cfg) | |||
| registerAgentTool(server, applyUpdateTool, cfg) | |||
There was a problem hiding this comment.
The PR description/title focuses on embedding payload truncation (#53), but this change set also adds a skills system and updater tools (list_skills/install_skill/check_update/apply_update) plus background update checks. This looks like a significant scope increase that isn't reflected in the PR metadata; please update the PR description (and possibly the title) to cover these additions, or split the updater/skills changes into a separate PR to keep review and rollback risk manageable.
| metaRunes := []rune(metaWithSep) | ||
| remaining := maxChars - len(metaRunes) | ||
| if remaining < 0 { | ||
| remaining = 0 | ||
| } |
There was a problem hiding this comment.
buildEmbedText() does not guarantee the returned text stays within maxChars when the metadata (docstring/signature) alone exceeds the limit. In that case metaWithSep is returned in full and can still overflow the embedding model context window, reintroducing the original failure mode. Consider enforcing the cap unconditionally (e.g., always keep signature, truncate docstring next, then code) and add a test where Docstring length > maxChars.
There was a problem hiding this comment.
Code Review
This pull request implements a new Skills System to provide contextual AI behaviors and enhances the update mechanism with local caching and Windows ZIP support. It also introduces a truncation strategy for code indexing to ensure chunks fit within model context windows. Review feedback highlights the need to ensure temporary update files are cleaned up before os.Exit, suggests memory optimizations for rune counting and string slicing in the indexer, and notes a potential overflow issue if metadata alone exceeds the character limit.
| log.Fatalf("Failed to close temporary file for update: %v", err) | ||
| } | ||
| // Ensure the temporary file is removed after applying the update | ||
| defer os.Remove(tempFile) |
There was a problem hiding this comment.
The defer os.Remove(tempFile) will not execute because os.Exit(0) is called at the end of this block (line 470). In Go, defer statements are not run when os.Exit() is called. Since this is a short-lived CLI path, you should manually call os.Remove(tempFile) before os.Exit(0) or wrap this logic in a function to ensure the temporary archive is cleaned up.
| "encoding/json" | ||
| "fmt" | ||
| "hash/fnv" | ||
| "log" |
| runes := []rune(full) | ||
| if len(runes) <= maxChars { | ||
| return full, false | ||
| } |
There was a problem hiding this comment.
| codeRunes := []rune(ch.Code) | ||
| if remaining > len(codeRunes) { | ||
| remaining = len(codeRunes) | ||
| } |
There was a problem hiding this comment.
Converting the potentially massive ch.Code string to []rune can lead to high memory usage or OOM for very large files. Since you only need a prefix of at most remaining runes, you can optimize this by converting only a safe byte-prefix of the string.
codePrefix := ch.Code
if len(codePrefix) > remaining*4 {
codePrefix = codePrefix[:remaining*4]
}
codeRunes := []rune(codePrefix)
if remaining > len(codeRunes) {
remaining = len(codeRunes)
}| if remaining > len(codeRunes) { | ||
| remaining = len(codeRunes) | ||
| } | ||
| truncated := metaWithSep + string(codeRunes[:remaining]) |
There was a problem hiding this comment.
If the metadata (metaWithSep) alone exceeds maxChars, the remaining count becomes 0, but the full metadata is still returned. This means the payload could still exceed the model's context window. While metadata is typically small, a strict cap would be safer to prevent the context overflow error mentioned in the PR description.
Description
Fixes the
"llm embedding error: the input length exceeds the context length"error reported by users indexing large codebases with Ollama.Root cause:
IndexPaths()was concatenatingdocstring + signature + codeinto a single embed payload with no size limit. For large symbols (big classes, generated files, long functions), this easily exceeds the context window of embedding models likenomic-embed-text(8 192 tokens).Fix: Added a
buildEmbedText()helper that caps the embed payload at 30 000 characters (~7 700 tokens, giving ~6% safety headroom). Metadata (docstring + signature) is always preserved in full — only the raw code body is truncated when the total exceeds the limit. A[WARN]log entry is emitted whenever truncation occurs so users can identify oversized symbols without an indexing failure.Fixes #53
Type of change
Checklist:
go fmt ./...go test ./...and they pass